Genetic diversity of Diaphorina citri (Hemiptera: Liviidae) unravels phylogeographic structure and invasion history of eastern African populations

Abstract The Asian citrus psyllid (Diaphorina citri Kuwayama) is a key pest of Citrus sp. worldwide, as it acts as a vector for Candidatus Liberibacter asiaticus, the bacterial pathogen that causes citrus Huanglongbing. Diaphorina citri has been reported in Kenya, Tanzania, and more recently in Ethiopia. This study assessed the genetic diversity and phylogeographic structure of the pest to gain insights into the potential sources of its introduction into Africa. Population structure and differentiation of D. citri populations from China, Ethiopia, Kenya, Tanzania, and the USA were assessed using 10 microsatellite loci. Additionally, five new complete mitogenomes of D. citri collected in China, Ethiopia, Kenya, Tanzania, and the USA were analyzed in the context of publicly available sequences. Genotype data grouped the D. citri populations from Kenya and Tanzania in one cluster, and those from Ethiopia formed a separate cluster. The two genetic clusters inferred from genotype data were congruent with mitochondrial sequence data. The mitogenomes from Kenya/Tanzania/China had 99.0% similarity, and the Ethiopia/USA had 99.9% similarity. In conclusion, D. citri populations in eastern Africa have different sources, as the Kenyan and Tanzanian populations probably originated from southeastern Asia, while the Ethiopian population most probably originated from the Americas.


| INTRODUC TI ON
Diaphorina citri Kuwayama (Hemiptera: Liviidae), commonly known as the Asian citrus psyllid, is currently found in tropical and subtropical Asia, Afghanistan, Saudi Arabia, Reunion, Mauritius, parts of South and Central America, the United States of America, Mexico and the Caribbean (Grafton-Cardwell et al., 2006), Tanzania (Shimwela et al., 2016), Kenya (Rwomushana et al., 2017) and Ethiopia (Ajene et al., 2020a) in East Africa, and Nigeria in West Africa (Oke et al., 2020). The feeding activities of D. citri cause direct damage to plants such as curling of the leaves (Grafton-Cardwell et al., 2006). Heavy citrus tree infestation by D. citri hampers the normal development of flush shoots, thus making them more likely to break off.
Additionally, the production of honeydew by the insect on the plant leads to sooty mold growth, which reduces the photosynthetic activity and productivity of trees. The most significant impact of D. citri on citrus, however, is the transmission of the phloem-limited bacterium Candidatus Liberibacter asiaticus (Las), the causal pathogen of Huanglongbing (Bové, 2006). Las is closely related to Candidatus Liberibacter americanus (Lam) and Candidatus Liberibacter africanus (Laf), a bacterium that is transmitted by Trioza erytreae (Hemiptera: Triozidae), a pest widespread in East and Southern African highlands (Rasowo et al., 2019). Las has been reported in Asia, and North and South America since 2004 (Grafton-Cardwell et al., 2013;Halbert & Manjunath, 2004;Hall et al., 2013). In Africa, Las was reported for the first time in Ethiopia in 2010 (Saponari et al., 2010) but its primary vector D. citri was not detected until 2020 (Ajene et al., 2020a).
Diaphorina citri has been shown, albeit experimentally, to transmit Laf in laboratory conditions (Massonie et al., 1976). Given the current context, the ongoing invasion of Africa by D. citri is a significant threat to citrus production on the continent.
Assessments of genetic diversity are useful in a wide range of scientific fields, including animal and plant breeding, evolution, conservation, pest control, and taxonomy (Wangari et al., 2013;Wu et al., 1999).
Knowledge on the genetic diversity of citrus psyllids across different geographical areas is useful for the assessment and management of risk and spread (Jantasorn et al., 2012). Different symptoms of citrus greening disease in various host species may be affected by the genetic diversity of the bacterial pathogen; for instance, symptoms of greening caused by Candidatus Liberibacter africanus are less severe than observed in trees infected by the subspecies Candidatus Liberibacter africanus subspecies clausenae (Ajene et al., 2020b). Furthermore, the movement and distribution of the insect vector in some geographical regions may also be affected by its genetic diversity (Jantasorn et al., 2012). The potential climatic suitability of D. citri in citrus growing areas in Africa poses a risk of rapid spread of citrus greening across the continent (Shimwela et al., 2016).
The genetic diversity of D. citri populations from different geographical regions has been assessed using mitochondrial cytochrome oxidase 1 (COI) sequences (De León et al., 2011;Boykin et al., 2012;Guidolin & Consoli, 2013;Lashkari et al., 2014), and showed that the presence of D. citri in the Americas resulted from two separate introductions (De León et al., 2011). Furthermore, two major mitochondrial groups have been identified in southwestern and southeastern Asia, and allowed for tracing the D. citri invasion of Florida, Texas, and Mexico to southwestern Asia (Boykin et al., 2012). The mitochondrial lineages of D. citri in Kenya and La Réunion showed a probable link between D. citri from Kenya and southeastern Asia (Wang et al., 2021). Also, a comparison of the microbiome and key endosymbionts of D. citri showed a close similarity between psyllids from China and those found in Kenya and Tanzania (Ajene et al., 2020c).
Simple sequence repeat markers, also known as microsatellites, are an important tool for determining genetic variation in populations (Sajib et al., 2012;Powell et al., 1996). Microsatellites are commonly used in genetic diversity studies, gene mapping (Ma et al., 2011;Sajib et al., 2012;Zhang et al., 2007), and genetic fingerprinting (Ma et al., 2011;Sajib et al., 2012;Xiao et al., 2006) because they are biparentally transmitted and generally highly polymorphic, and statistical tools for genotype data analyses are well developed and widely available. Microsatellite markers are employed to track possible geographical origin(s) of pest species (Boykin et al., 2007;Fraimout et al., 2015;Khamis et al., 2009;Ruíz-Rivero et al., 2021), and in our case, the markers have been used to infer the invasion pattern of introduction and spread of D. citri into Africa. This information can be instrumental in the development of appropriate management strategies and monitoring of expansions of the geographic range of this agricultural pest of economic importance. The notable rapid spread of D. citri in Tanzania and Kenya and its recent detection in Ethiopia indicate a high risk of invasion and establishment of the pest in other regions of eastern and Southern Africa. Therefore, gaining insights into the population structure and genetic diversity of African D. citri populations using microsatellite markers is relevant and opportune.
This study aimed to provide baseline genetic data for understanding the population dynamics and genetic structure of D. citri in eastern Africa, and to gain insights into the geographic origin of the invasion.

| Sample collection and DNA extraction
Field surveys were carried out in citrus orchards and backyard gardens in Ethiopia, Kenya, and Tanzania from March 2017 to December 2019. Specimens of D. citri were aspirated from host trees and stored individually in 96% ethanol until DNA extraction.
Geographic positioning system (GPS) coordinates of the sampling sites were recorded for each location using a Garmin eTrex20 in-

| Marker summary statistics and intrapopulation genetic diversity
Microsatellite data were prepared for analyses using the MS Tools add-in for Microsoft Excel (Park, 2001). Assessment of population genetic parameters was performed in the GenAlEx Add-in for Microsoft Excel (Peakall & Smouse, 2012). Genetic differentiation was estimated using standard genetic distances. Allele frequencies by population (AFP), heterozygosity, F-statistics, and polymorphism by population (HFP), allelic pattern graph (APT), private alleles by population (PAS), and samples with more than one private allele (PAL) were calculated in GenAlEx.

| Population structure analysis
An analysis of molecular variance (AMOVA) for testing different population groupings to determine whether the metapopulation is in panmixia and the component of genetic diversity attributable to variance between and within populations from the different countries, and F ST statistics were performed in Arlequin v3.5.2.2 (Excoffier & Lischer, 2010). Significance for the AMOVA was tested using 10,000 permutations. Exact tests for linkage disequilibrium and deviation from the Hardy-Weinberg equilibrium were conducted using the web version of GENEPOP (Raymond & Rousset 1995). Null alleles, stutters, and allelic dropout analysis were estimated using Micro- and MaxMedK). This method takes into account and corrects for uneven sampling. The most likely number of K that explained the structure in data was selected using the online resource STRUCTURE SELECTOR (Li & Liu, 2018), which calculates the estimators aid in the selection and visualization of the optimal K. The multivariate clustering approach was implemented using adegenet (Jombart, 2008) and poppr (Kamvar et al., 2014)    based on 10,000 simulated genotypes for each population and on a threshold probability value of 0.05 was used to identify the accurate exclusion/inclusion critical values to assign the individuals in the populations. Data format conversions for the various software programs were performed with CONVERT version 1.31 (Glaubitz, 2004).

| Mitogenome sequencing, assembly, and annotation
One adult specimen per country (China, Ethiopia, Kenya, Tanzania, and the USA) was used for next-generation sequencing (NGS) for the recovery of complete mitochondrial genome sequences.
Total DNA from five specimens was sequenced separately

| Comparison of mitogenome sequences
To assess genetic divergence and identify single nucleotide polymorphisms (SNPs) within the PCGs, rRNAs, and tRNAs among the D. citri mitogenomes, multiple sequence alignments of the sequences obtained in this study and 31 publicly available sequences (see Table   S2) were performed using the MAFFT algorithm (Katoh et al., 2013) available in Geneious Prime. Genetic distances among all sequences were calculated using nucleotide pairwise distances (p-distances) in MEGA X (Kumar et al., 2018) under the Kimura 2-parameter model (Kimura, 1980). Genetic distances among the mitogenomes were represented with multidimensional scaling analysis using the "cmdscale" function in R version 3.5.1 (R Development Core Team, 2008) on the genetic distance matrix to generate the plot for principal coordinate analysis (PCoA).

| Phylogeographic structure of D. citri
The phylogeographic structure of D. citri was assessed using maximum-likelihood (ML) trees and median-joining (MJ) networks.
The five COI and 16S rRNA sequences extracted from the complete mitogenome generated in this study and other COI (n = 573) and 16S rRNA (n = 249) sequences available on GenBank (see Table S3) were used to generate separate ML trees. Multiple sequence alignments were performed using MAFFT and used to construct the MJ network with Network software v5 (http://www.fluxu s-engin eering. com/share net.htm), under the default settings (Bandelt et al., 1999).
Genetic divergences among the COI haplogroups displayed in the MJ network were calculated as pairwise distances (p-distances) using MEGA X. ML trees constructed in MEGA X using the Tamura 3-parameter model (T92) (Tamura, 1992) Table S4).
Pairwise FST values were generally low and ranged from 0.012 to 0.305, and some of the lowest FST values were nonsignificant (

| Population structure
The AMOVA performed across all groups revealed high variance within the populations (80.5%) and low variance among populations (19.5%), indicating that the populations are in panmixia. The AMOVA performed on large-scale region groups (Africa separated from all other populations) showed that genetic differentiation among groups accounted for 9.8% of the total variance. All fixation indices, including FCT, FSC, FST, FIS, and FIT, were highly significant (p < .01) ( Table 3).
STRUCTURE analyses showed that the maximum value for the estimated likelihood of K was found at K = 2 (see Figure S1 in Appendix S1). The average values of ancestry probabilities (Q) for each population in the two clusters showed that the psyllids from Texas (USA) had the highest ancestry in one cluster (Q = 0.863), while the psyllids from Mikese (Tanzania) had the highest ancestry in the second cluster (Q = 0.708). Among African populations, the psyllids from Kenya and Tanzania grouped in one cluster, while those from Ethiopia were in the second cluster (see Table S5). Visualization of cluster membership coefficients showed that psyllids from China, Ethiopia, and the USA formed one cluster (red), while the psyllids from Kenya and Tanzania formed another cluster (green) (Figure 1).
DAPC showed a similar pattern of similarity between the Kenyan and the Tanzanian populations, while the psyllids from Ethiopia clustered with those from the USA (Figure 2).  (Table 4). The migration pattern displayed as a spanning tree network showed that Lungalunga was more closely linked to the Tanzanian populations than to the other Kenyan populations (Figure 3).

| Sequencing, mapping, and assembly of five new complete D. citri mitogenomes
The Illumina MiSeq run resulted in an average of 17.6 million reads for each sample, with an average read length of 151 bp. Average GC and AT content were 41% and 59%, respectively. Average Q20 (ratio of bases with phred quality score > 20) was 99%, and average Q30 (ratio of bases with phred quality score > 30) was 95% (see Figure S2). The number of assembled reads ranged between 41, 125 (Tanzania), and 183,235 (China), and sequence average coverage ranged from 414× (Tanzania) to 1845× (China) (see Figure S3).

| General mitogenomic features
The five D. citri mitogenomes had the typical Metazoan complement of 13 PCGs, 22 transfer tRNAs, two rRNAs, and an AT-rich noncoding region generally assumed to control transcription and replication. Gene order was identical to that of the D. citri mitogenomes publicly available and other species in the family Psyllidae, and identical to the hypothetical ancestral mitogenome organization in insects (Boore, 1999). Twenty-three genes were located on the majority strand (J-strand), and the other 14 genes on the minority strand (N-strand). The complete mitogenome sequences had an average total size of 14,995 bp, similar to that of the sequence (KU647697) used as a reference for mapping and assembly of NGS reads (14,996 bp).

| Protein-coding genes, noncoding AT-rich region, and intergenic and overlapping regions
The average combined length of the 13 PCGs was 10,798 bp and varied between ND5 (1618 bp) and ATP8 (153 bp) (see Table S6). ik e s e ( T a n z a n ia ) M la li ( T a n z a n ia ) S o in ( K e n y a ) T e x a s ( U S A ) noncoding region (902 bp), located between 12 s rRNA and the tRNA Ile -tRNA Gln -tRNA Met cluster, was annotated as the AT-rich region.

| Nucleotide composition and strand asymmetry
The complete mitogenomes had the high A+T content typical of insects, with values higher than 74.3% in all individual genes (see Table S7). The average A+T content of the AT-rich region (85.9%) was higher than the average for the complete sequence (74.4%), the combined tRNAs (75.1%), and the two rRNAs (76.9%). Nine PCGs (COII, ATP6, ATP8, COIII, ND3 ND6, CYTB, and ND2) had negative AT skews, and four (ND5, ND4, ND4L, and ND1) had positive AT skews. GC skews were negative for all PCGs.  The total number of substitutions between the two groups comprised 34 transitions and five transversions (see Table S8). Group 1 and Group 2 also differed in the nucleotide sequence in three  tRNAs: a U-C change in the TΨC in tRNA Phe , an additional A on the TΨC in tRNA Asn (Group 1), and a U-C change in the variable loop of tRNA Ser1 (see Figure S4).

| Phylogeographic structure and genetic diversity of D. citri
The The COI ML tree recovered three distinct branches (Branch I-III).
The new sequences, represented by 12 closely related haplotypes (H4-H15) seen in the MJ network, formed Branch I along with most sequences from Asia, the Americas, and the Caribbean. Branch II was also formed by sequences from Asia, the Americas, and the Caribbean. Branch III was formed by six haplotypes (H16-H21) exclusively from Asia and the Americas (Figure 6a).
The 16S rRNA ML tree recovered two distinct branches: the new sequences from China, Kenya, and Tanzania fell on the same branch with all sequences from Asia, while Ethiopia, the USA, and Pakistan formed the second branch (Figure 6a). The COI p-distances evidenced the divergence of the Ethiopian D. citri sequence, although this singleton haplotype was less diverged from the sequence from the USA (0.24%) than from the Kenyan, Tanzanian, and Chinese sequences (0.72%) (see Table S9). The principal coordinate analysis  such as Bactrocera correcta (Bezzi) and Bactrocera dorsalis (Hendel) in China (Aketarawong et al., 2007;Shi et al., 2012;Qin, et al., 2016) and Bactrocera invadens (Drew) in Africa (Khamis et al., 2009).
A recent study based on complete mitochondrial genomes of D.
citri concluded that the psyllid in California was likely not introduced from China but from somewhere in the southeastern USA (Wu et al., 2017). Based on COI sequences, the recently detected D. citri in Ethiopia showed a close phylogenetic relationship with D. citri from the Americas (Ajene et al., 2020a).  (Wu et al., 2017). Therefore, our results agree with the previous report of two genetic clusters associated with the USA and China, identified by mitogenome analyses (Wu et al., 2017).
The COI-based ML tree showed three main branches, and all sequences generated in this study clustered on Branch I, which also For example, COI together with ND5 has been used for phylogenetic studies of flies (Navajas et al., 1996), CytB has been shown to be better than COI at recovering population structure in Athetis lepigone (Chen et al., 2019), COIII has been used in intraspecific studies of Ixodes pacificus (Kain et al., 1999), and comparisons of complete mitogenomes showed that the most polymorphic regions of two Saturniidae species were located in ATP6, ND4, ND5, ND6, and CYTB (Langley et al., 2020 In conclusion, our study shows that the most probable introduction of D. citri into Kenya and Tanzania occurred from Asia, in accordance with the link between the Chinese D. citri population and the populations from Kenya and Tanzania based on the microbiome and endosymbiont composition of the psyllid (Ajene et al., 2020c). We also show that the D. citri from Ethiopia represented a distinct introduction event possibly originating from the Americas.
The clarification of the separate introduction routes of D. citri and Las in Africa may have implications on HLB regulatory strategies, in particular those aimed at preventing the establishment of Las. This information is of phytosanitary and quarantine regulatory importance for the management of HLB spread in the continent.
Accurate determination of the origin of D. citri populations in Africa and its population structure and diversity across a wider geographic range on the continent will provide further insights and hopefully strengthen the argument for stricter quarantine and regulatory policies.

ACK N OWLED G M ENTS
We gratefully acknowledge the following organizations and agencies for supporting this research: German Academic Exchange (DAAD).
This work received financial support from the German Federal

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The mitogenome sequence data that support the findings of this